noise tolerance
Multi-Modal Dataset Distillation in the Wild
Dang, Zhuohang, Luo, Minnan, Jia, Chengyou, Qian, Hangwei, Chang, Xiaojun, Tsang, Ivor W.
Recent multi-modal models have shown remarkable versatility in real-world applications. However, their rapid development encounters two critical data challenges. First, the training process requires large-scale datasets, leading to substantial storage and computational costs. Second, these data are typically web-crawled with inevitable noise, i.e., partially mismatched pairs, severely degrading model performance. To these ends, we propose Multi-modal dataset Distillation in the Wild, i.e., MDW, the first framework to distill noisy multi-modal datasets into compact clean ones for effective and efficient model training. Specifically, MDW introduces learnable fine-grained correspondences during distillation and adaptively optimizes distilled data to emphasize correspondence-discriminative regions, thereby enhancing distilled data's information density and efficacy. Moreover, to capture robust cross-modal correspondence prior knowledge from real data, MDW proposes dual-track collaborative learning to avoid the risky data noise, alleviating information loss with certifiable noise tolerance. Extensive experiments validate MDW's theoretical and empirical efficacy with remarkable scalability, surpassing prior methods by over 15% across various compression ratios, highlighting its appealing practicality for applications with diverse efficacy and resource needs.
Efficient PAC Learning of Halfspaces with Constant Malicious Noise Rate
Understanding noise tolerance of learning algorithms under certain conditions is a central quest in learning theory. In this work, we study the problem of computationally efficient PAC learning of halfspaces in the presence of malicious noise, where an adversary can corrupt both instances and labels of training samples. The best-known noise tolerance either depends on a target error rate under distributional assumptions or on a margin parameter under large-margin conditions. In this work, we show that when both types of conditions are satisfied, it is possible to achieve {\em constant} noise tolerance by minimizing a reweighted hinge loss. Our key ingredients include: 1) an efficient algorithm that finds weights to control the gradient deterioration from corrupted samples, and 2) a new analysis on the robustness of the hinge loss equipped with such weights.
Learning large-margin halfspaces with more malicious noise
We describe a simple algorithm that runs in time poly(n, 1/γ, 1/ε) and learns an unknown n-dimensional γ-margin halfspace to accuracy 1 ε in the presence of malicious noise, when the noise rate is allowed to be as high as Θ(εγ log(1/γ)). Previous efficient algorithms could only learn to accuracy ε in the presence of malicious noise of rate at most Θ(εγ). Our algorithm does not work by optimizing a convex loss function. We show that no algorithm for learning γ-margin halfspaces that minimizes a convex proxy for misclassification error can tolerate malicious noise at a rate greater than Θ(εγ); this may partially explain why previous algorithms could not achieve the higher noise tolerance of our new algorithm.
Scaling Model Checking for DNN Analysis via State-Space Reduction and Input Segmentation (Extended Version)
Naseer, Mahum, Hasan, Osman, Shafique, Muhammad
Owing to their remarkable learning capabilities and performance in real-world applications, the use of machine learning systems based on Neural Networks (NNs) has been continuously increasing. However, various case studies and empirical findings in the literature suggest that slight variations to NN inputs can lead to erroneous and undesirable NN behavior. This has led to considerable interest in their formal analysis, aiming to provide guarantees regarding a given NN's behavior. Existing frameworks provide robustness and/or safety guarantees for the trained NNs, using satisfiability solving and linear programming. We proposed FANNet, the first model checking-based framework for analyzing a broader range of NN properties. However, the state-space explosion associated with model checking entails a scalability problem, making the FANNet applicable only to small NNs. This work develops state-space reduction and input segmentation approaches, to improve the scalability and timing efficiency of formal NN analysis. Compared to the state-of-the-art FANNet, this enables our new model checking-based framework to reduce the verification's timing overhead by a factor of up to 8000, making the framework applicable to NNs even with approximately $80$ times more network parameters. This in turn allows the analysis of NN safety properties using the new framework, in addition to all the NN properties already included with FANNet. The framework is shown to be efficiently able to analyze properties of NNs trained on healthcare datasets as well as the well--acknowledged ACAS Xu NNs.
UnbiasedNets: A Dataset Diversification Framework for Robustness Bias Alleviation in Neural Networks
Naseer, Mahum, Prabakaran, Bharath Srinivas, Hasan, Osman, Shafique, Muhammad
Performance of trained neural network (NN) models, in terms of testing accuracy, has improved remarkably over the past several years, especially with the advent of deep learning. However, even the most accurate NNs can be biased toward a specific output classification due to the inherent bias in the available training datasets, which may propagate to the real-world implementations. This paper deals with the robustness bias, i.e., the bias exhibited by the trained NN by having a significantly large robustness to noise for a certain output class, as compared to the remaining output classes. The bias is shown to result from imbalanced datasets, i.e., the datasets where all output classes are not equally represented. Towards this, we propose the UnbiasedNets framework, which leverages K-means clustering and the NN's noise tolerance to diversify the given training dataset, even from relatively smaller datasets. This generates balanced datasets and reduces the bias within the datasets themselves. To the best of our knowledge, this is the first framework catering to the robustness bias problem in NNs. We use real-world datasets to demonstrate the efficacy of the UnbiasedNets for data diversification, in case of both binary and multi-label classifiers. The results are compared to well-known tools aimed at generating balanced datasets, and illustrate how existing works have limited success while addressing the robustness bias. In contrast, UnbiasedNets provides a notable improvement over existing works, while even reducing the robustness bias significantly in some cases, as observed by comparing the NNs trained on the diversified and original datasets.
A Formal Approach to Identifying the Impact of Noise on Neural Networks
The past few years have seen an incredible rise in the use of smart systems based on artificial neural networks (ANNs), owing to their remarkable classification capability and decision making comparable to that of humans. Yet, as shown in Figure 1, the addition of even a small amount of noise to the input may trigger these networks to give incorrect results.13 This is an alarming limitation of the ANNs, particularly for those deployed in safety-critical applications such as autonomous vehicles, aviation, and healthcare. For instance, consider a self-driving car using an ANN to perceive traffic signs as shown in Figure 2; the correct classification by the ANN in noisy real-world environments is crucial for the safety of humans and objects in the vicinity of the car. Magnitudes of image input and the noise applied to it.
Noise tolerance of learning to rank under class-conditional label noise
Often, the data used to train ranking models is subject to label noise. For example, in web-search, labels created from clickstream data are noisy due to issues such as insufficient information in item descriptions on the SERP, query reformulation by the user, and erratic or unexpected user behavior. In practice, it is difficult to handle label noise without making strong assumptions about the label generation process. As a result, practitioners typically train their learning-to-rank (LtR) models directly on this noisy data without additional consideration of the label noise. Surprisingly, we often see strong performance from LtR models trained in this way. In this work, we describe a class of noise-tolerant LtR losses for which empirical risk minimization is a consistent procedure, even in the context of class-conditional label noise. We also develop noise-tolerant analogs of commonly used loss functions. The practical implications of our theoretical findings are further supported by experimental results.
Sample-Optimal PAC Learning of Halfspaces with Malicious Noise
We study efficient PAC learning of homogeneous halfspaces in $\mathbb{R}^d$ in the presence of malicious noise of Valiant~(1985). This is a challenging noise model and only until recently has near-optimal noise tolerance bound been established under the mild condition that the unlabeled data distribution is isotropic log-concave. However, it remains unsettled how to obtain the optimal sample complexity simultaneously. In this work, we present a new analysis for the algorithm of Awasthi et al.~(2017) and show that it essentially achieves the near-optimal sample complexity bound of $\tilde{O}(d)$, improving the best known result of $\tilde{O}(d^2)$. Our main ingredient is a novel incorporation of a Matrix Chernoff-type inequality to bound the spectrum of an empirical covariance matrix for well-behaved distributions, in conjunction with a careful exploration of the localization schemes of Awasthi et al.~(2017). We further extend the algorithm and analysis to the more general and stronger nasty noise model of Bshouty~et~al. (2002), showing that it is still possible to achieve near-optimal noise tolerance and sample complexity in polynomial time.
Quaternion-Valued Recurrent Projection Neural Networks on Unit Quaternions
Valle, Marcos Eduardo, Lobo, Rodolfo Anibal
Hypercomplex-valued neural networks, including quaternion-valued neural networks, can treat multidimensional data as a single entity. In this paper, we present the quaternion-valued recurrent projection neural networks (QRPNNs). Briefly, QRPNNs are obtained by combining the non-local projection learning with the quaternion-valued recurrent correlation neural network (QRCNNs). We show that QRPNNs overcome the crosstalk problem of QRCNNs. Thus, they are appropriate to implement associative memories. Furthermore, computational experiments reveal that QRPNNs exhibit greater storage capacity and noise tolerance than their corresponding QRCNNs. Introduction The Hopfield neural network, developed in the early 1980s, is an important and widely-known recurrent neural network which can be used to implement associative memories [1, 2]. Successful applications of the Hopfield network include control [3, 4], computer vision and image processing [5, 6], classification [7, 8], and optimization [2, 9, 10]. Despite its many successful applications, the Hopfield network may suffer from a very low storage capacity when used to implement associative memories. Precisely, due to crosstalk between the stored items, the Hebbian learning adopted by Hopfield in his original work allows for the storage of approximately n/(2 ln n) items, where n denotes the length of the stored vectors [11]. For example, Personnaz et al. [12] as well as Kanter and Sompolinsky [13] proposed the projection rule to determine the synaptic weights of the Hopfield networks. The projection rule increases the storage capacity of the Hopfield network to n 1 items. Another simple but effective improvement on the storage capacity of the original Hopfield networks was achieved by Chiueh and Goodman's recurrent correlation neural networks (RCNNs) [14, 15]. Briefly, an RCNN is obtained by decomposing the Hopfield network with Hebbian learning into a two layer recurrent neural network.
FANNet: Formal Analysis of Noise Tolerance, Training Bias and Input Sensitivity in Neural Networks
Naseer, Mahum, Minhas, Mishal Fatima, Khalid, Faiq, Hanif, Muhammad Abdullah, Hasan, Osman, Shafique, Muhammad
With a constant improvement in the network architectures and training methodologies, Neural Networks (NNs) are increasingly being deployed in real-world Machine Learning systems. However, despite their impressive performance on "known inputs", these NNs can fail absurdly on the "unseen inputs", especially if these real-time inputs deviate from the training dataset distributions, or contain certain types of input noise. This indicates the low noise tolerance of NNs, which is a major reason for the recent increase of adversarial attacks. This is a serious concern, particularly for safety-critical applications, where inaccurate results lead to dire consequences. We propose a novel methodology that leverages model checking for the Formal Analysis of Neural Network (FANNet) under different input noise ranges. Our methodology allows us to rigorously analyze the noise tolerance of NNs, their input node sensitivity, and the effects of training bias on their performance, e.g., in terms of classification accuracy. For evaluation, we use a feed-forward fully-connected NN architecture trained for the Leukemia classification. Our experimental results show $\pm 11\%$ noise tolerance for the given trained network, identify the most sensitive input nodes, and confirm the biasness of the available training dataset.